81 research outputs found

    Machine Learning Approaches to Identify Nicknames from A Statewide Health Information Exchange

    Get PDF
    Patient matching is essential to minimize fragmentation of patient data. Existing patient matching efforts often do not account for nickname use. We sought to develop decision models that could identify true nicknames using features representing the phonetical and structural similarity of nickname pairs. We identified potential male and female name pairs from the Indiana Network for Patient Care (INPC), and developed a series of features that represented their phonetical and structural similarities. Next, we used the XGBoost classifier and hyperparameter tuning to build decision models to identify nicknames using these feature sets and a manually reviewed gold standard. Decision models reported high Precision/Positive Predictive Value and Accuracy scores for both male and female name pairs despite the low number of true nickname matches in the datasets under study. Ours is one of the first efforts to identify patient nicknames using machine learning approaches

    Public Health Informatics in Local and State Health Agencies: An Update From the Public Health Workforce Interests and Needs Survey

    Get PDF
    OBJECTIVE: To characterize public health informatics (PHI) specialists and identify the informatics needs of the public health workforce. DESIGN: Cross-sectional study. SETTING: US local and state health agencies. PARTICIPANTS: Employees from state health agencies central office (SHA-COs) and local health departments (LHDs) participating in the 2017 Public Health Workforce Interests and Needs Survey (PH WINS). We characterized and compared the job roles for self-reported PHI, "information technology specialist or information system manager" (IT/IS), "public health science" (PHS), and "clinical and laboratory" workers. MAIN OUTCOME MEASURE: Descriptive statistics for demographics, income, education, public health experience, program area, job satisfaction, and workplace environment, as well as data and informatics skills and needs. RESULTS: A total of 17 136 SHA-CO and 26 533 LHD employees participated in the survey. PHI specialist was self-reported as a job role among 1.1% and 0.3% of SHA-CO and LHD employees. The PHI segment most closely resembled PHS employees but had less public health experience and had lower salaries. Overall, fewer than one-third of PHI specialists reported working in an informatics program area, often supporting epidemiology and surveillance, vital records, and communicable disease. Compared with PH WINS 2014, current PHI respondents' satisfaction with their job and workplace environment moved toward more neutral and negative responses, while the IT/IS, PHS, and clinical and laboratory subgroups shifted toward more positive responses. The PHI specialists were less likely than those in IT/IS, PHS, or clinical and laboratory roles to report gaps in needed data and informatics skills. CONCLUSIONS: The informatics specialists' role continues to be rare in public health agencies, and those filling that role tend to have less public health experience and be less well compensated than staff in other technically focused positions. Significant data and informatics skills gaps persist among the broader public health workforce

    Measuring the impact of a health information exchange intervention on provider-based notifiable disease reporting using mixed methods: a study protocol

    Get PDF
    Background Health information exchange (HIE) is the electronic sharing of data and information between clinical care and public health entities. Previous research has shown that using HIE to electronically report laboratory results to public health can improve surveillance practice, yet there has been little utilization of HIE for improving provider-based disease reporting. This article describes a study protocol that uses mixed methods to evaluate an intervention to electronically pre-populate provider-based notifiable disease case reporting forms with clinical, laboratory and patient data available through an operational HIE. The evaluation seeks to: (1) identify barriers and facilitators to implementation, adoption and utilization of the intervention; (2) measure impacts on workflow, provider awareness, and end-user satisfaction; and (3) describe the contextual factors that impact the effectiveness of the intervention within heterogeneous clinical settings and the HIE. Methods/Design The intervention will be implemented over a staggered schedule in one of the largest and oldest HIE infrastructures in the U.S., the Indiana Network for Patient Care. Evaluation will be conducted utilizing a concurrent design mixed methods framework in which qualitative methods are embedded within the quantitative methods. Quantitative data will include reporting rates, timeliness and burden and report completeness and accuracy, analyzed using interrupted time-series and other pre-post comparisons. Qualitative data regarding pre-post provider perceptions of report completeness, accuracy, and timeliness, reporting burden, data quality, benefits, utility, adoption, utilization and impact on reporting workflow will be collected using semi-structured interviews and open-ended survey items. Data will be triangulated to find convergence or agreement by cross-validating results to produce a contextualized portrayal of the facilitators and barriers to implementation and use of the intervention. Discussion By applying mixed research methods and measuring context, facilitators and barriers, and individual, organizational and data quality factors that may impact adoption and utilization of the intervention, we will document whether and how the intervention streamlines provider-based manual reporting workflows, lowers barriers to reporting, increases data completeness, improves reporting timeliness and captures a greater portion of communicable disease burden in the community

    An Adversorial Approach to Enable Re-Use of Machine Learning Models and Collaborative Research Efforts Using Synthetic Unstructured Free-Text Medical Data

    Get PDF
    We leverage Generative Adversarial Networks (GAN) to produce synthetic free-text medical data with low re-identification risk, and apply these to replicate machine learning solutions. We trained GAN models to generate free-text cancer pathology reports. Decision models were trained using synthetic datasets reported performance metrics that were statistically similar to models trained using original test data. Our results further the use of GANs to generate synthetic data for collaborative research and re-use of machine learning models

    Evaluating Methods for Identifying Cancer in Free-Text Pathology Reports Using Various Machine Learning and Data Preprocessing Approaches

    Get PDF
    Automated detection methods can address delays and incompleteness in cancer case reporting. Existing automated efforts are largely dependent on complex dictionaries and coded data. Using a gold standard of manually reviewed pathology reports, we evaluated the performance of alternative input formats and decision models on a convenience sample of free-text pathology reports. Results showed that the input format significantly impacted performance, and specific algorithms yielded better results for presicion, recall and accuracy. We conclude that our approach is sufficiently accurate for practical purposes and represents a generalized process

    An Evaluation of Two Methods for Generating Synthetic HL7 Segments Reflecting Real-World Health Information Exchange Transactions

    Get PDF
    Motivated by the need for readily available data for testing an open-source health information exchange platform, we developed and evaluated two methods for generating synthetic messages. The methods used HL7 version 2 messages obtained from the Indiana Network for Patient Care. Data from both methods were analyzed to assess how effectively the output reflected original 'real-world' data. The Markov Chain method (MCM) used an algorithm based on transitional probability matrix while the Music Box model (MBM) randomly selected messages of particular trigger type from the original data to generate new messages. The MBM was faster, generated shorter messages and exhibited less variation in message length. The MCM required more computational power, generated longer messages with more message length variability. Both methods exhibited adequate coverage, producing a high proportion of messages consistent with original messages. Both methods yielded similar rates of valid messages

    Variation in Information Needs and Quality: Implications for Public Health Surveillance and Biomedical Informatics

    Get PDF
    Understanding variation among users’ information needs and the quality of information in an electronic system is important for informaticians to ensure data are fit-for-use in answering important questions in clinical and public health. To measure variation in satisfaction with currently reported data, as well as perceived importance and need with respect to completeness and timeliness, we surveyed epidemiologists and other public health professionals across multiple jurisdictions. We observed consensus for some data elements, such as county of residence, which respondents perceived as important and felt should always be reported. However information needs differed for many data elements, especially when comparing notifiable diseases such as chlamydia to seasonal (influenza) and chronic (diabetes) diseases. Given the trend towards greater volume and variety of data as inputs to surveillance systems, variation of information needs impacts system design and practice. Systems must be flexible and highly configurable to accommodate variation, and informaticians must measure and improve systems and business processes to accommodate for variation of both users and information

    A practical method for predicting frequent use of emergency department care using routinely available electronic registration data.

    Get PDF
    Accurately predicting future frequent emergency department (ED) utilization can support a case management approach and ultimately reduce health care costs. This study assesses the feasibility of using routinely collected registration data to predict future frequent ED visits

    A Vision for the Systematic Monitoring and Improvement of the Quality of Electronic Health Data

    Get PDF
    In parallel with the implementation of information and communications systems, health care organizations are beginning to amass large-scale repositories of clinical and administrative data. Many nations seek to leverage so-called Big Data repositories to support improvements in health outcomes, drug safety, health surveillance, and care delivery processes. An unsupported assumption is that electronic health care data are of sufficient quality to enable the varied use cases envisioned by health ministries. The reality is that many electronic health data sources are of suboptimal quality and unfit for particular uses. To more systematically define, characterize and improve electronic health data quality, we propose a novel framework for health data stewardship. The framework is adapted from prior data quality research outside of health, but it has been reshaped to apply a systems approach to data quality with an emphasis on health outcomes. The proposed framework is a beginning, not an end. We invite the biomedical informatics community to use and adapt the framework to improve health data quality and outcomes for populations in nations around the world

    Impact of Selective Mapping Strategies on Automated Laboratory Result Notification to Public Health Authorities

    Get PDF
    Automated electronic laboratory reporting (ELR) for public health has many potential advantages, but requires mapping local laboratory test codes to a standard vocabulary such as LOINC. Mapping only the most frequently reported tests provides one way to prioritize the effort and mitigate the resource burden. We evaluated the implications of selective mapping on ELR for public health by comparing reportable conditions from an operational ELR system with the codes in the LOINC Top 2000. Laboratory result codes in the LOINC Top 2000 accounted for 65.3% of the reportable condition volume. However, by also including the 129 most frequent LOINC codes that identified reportable conditions in our system but were not present in the LOINC Top 2000, this set would cover 98% of the reportable condition volume. Our study highlights the ways that our approach to implementing vocabulary standards impacts secondary data uses such as public health reporting
    • …
    corecore